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<^ Abstract 

This paper considers the problem of detecting equal-shaped non-overlapping uni- 
modal peaks in the presence of Gaussian ergodic stationary noise, where the number, 
location and heights of the peaks are unknown. A multiple testing approach is proposed 
["T"1 ■ in which, after kernel smoothing, the presence of a peak is tested at each observed local 

maximum. The procedure provides strong control of the family wise error rate and 
the false discovery rate asymptotically as both the signal-to-noise ratio (SNR) and the 
search space get large, where the search space may grow exponentially as a function 
of SNR. Simulations assuming a Gaussian peak shape and a Gaussian autocorrelation 
function show that desired error levels are achieved for relatively low SNR and are ro- 
bust to partial peak overlap. Simulations also show that detection power is maximized 
when the smoothing bandwidth is close to the bandwidth of the signal peaks, akin to 
the well-known matched filter theorem in signal processing. The procedure is illustrated 
i in an analysis of electrical recordings of neuronal cell activity. 
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oo ■ 1 Introduction 

O. 

Peak detection is a common statistical problem in the analysis of high-throughput data. 
Examples include identification of binding sites on the genome (Zhang et al., 2008), DNA 
sequencing (Li and Speed, 2000), identification of proteins in mass spectrometry (Morris 
^ ■ et al., 2006; Harezlak et al., 2008), detection of action potentials in neuronal recordings 

(Baccus and Meister, 2002), detection of heart beats in electrocardiograms (Arzeno et al., 
2008), and identification of signatures in galaxy spectra (Brutti et al., 2007). A crucial step 
in the analysis of these data, both for dimension reduction and consequent inference, is the 
detection of an unknown number of signal peaks with a temporal or spatial structure in 
the presence of background noise. The main challenge in these problems is that both the 
number of peaks and their location are unknown. In addition, the data often consist of a 
single long sequence with no replicates. 

While there are many peak detection algorithms in the scientific literature, they tend to 
be geared toward specific applications and their performance is often evaluated empirically. 
In particular, peak detection algorithms often require a threshold, but the choice of the 
threshold is ad-hoc and does not take into account the error inflation produced by multiple 
testing. Errors in peak detection can lead to erroneous conclusions in later steps of the data 
analysis. There is a need to approach peak detection from a formal statistical viewpoint. 
Our objective is to develop a general statistical procedure for identifying signal peaks in the 
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presence of background noise with proven error control, while at the same time being easy 
to implement and efficient to run on large data sets. 

In this paper, we consider the specific problem of detecting equal-shaped non-overlapping 
unimodal peaks in the presence of Gaussian stationary noise, where the number, locations 
and heights of the peaks are unknown. We have chosen this particular setting for its ana- 
lytical tractibility, but we believe that it captures the essence of the general peak detection 
problem and can serve as a theoretical basis for future applications in the particular scientific 
disciplines. 

The assumption of not knowing the number of peaks is key. If the number of peaks 
were known, then the unknown locations could be estimated solving a nonlinear least- 
squares problem (O'Brien et al., 1994; Li and Speed, 2000, 2004). The main difficulty of 
this approach is that not knowing the number of peaks implies not knowing the number 
of location parameters to be estimated. As a consequence, the problem becomes akin to 
the model selection problem in regression (Li and Speed, 2000, 2004). Alternative solutions 
using L\ regularization include direct penalization of the estimated signal (O'Brien et al., 
1994) and a modification of the LASSO algorithm where the penalty is applied to the 
difference between consecutive coefficients to account for the ordered structure of the data 
(Tibshirani et al, 2005). 

Rather than an estimation problem, we view peak detection as a multiple testing prob- 
lem, where, at each of a set of locations, a test is performed for whether the signal is 
nonzero at that location. Our approach can be viewed essentially as a search for peaks over 
the length of the data. The idea has been used elsewhere (Yasui et al., 2003; Morris et al., 
2006; Chumbley and Friston, 2009) but not formally for multiple testing, and is motivated 
as follows. A full search for peaks would require testing every single observed point for 
significance. However, if the peaks are assumed unimodal and non-overlapping, dimension- 
ality can be reduced dramatically by testing only at locations that resemble peaks, that is, 
local maxima of the observed sequence. In addition, it is known that signal-to-noise ratio 
(SNR) can be improved by local smoothing such as averaging over a local neighbourhood. 
Based on these ideas, our proposed algorithm consists of the following steps: 

1. Kernel smoothing 

2. Candidate peaks: find local maxima of the smoothed sequence. 

3. P-values: compute a p-value at each local maximum, defined as the probability of 
peering the observed intensity of the local maximum or higher according to the dis- 
tribution that would be expected if we only observed noise. 

4. Multiple testing: use a multiple testing procedure to find a global threshold and declare 
significant all peaks exceeding that threshold. 

For Step 4, we consider two standard multiple testing procedures: Bonferroni and Benjamini- 
Hochberg (BH) (Benjamini and Hochberg, 1995). Our peak detection algorithm has the 
advantage of being simple, easy to remember and efficient to implement for large data sets. 
At the same time, we believe it to be powerful and we show that it provides guaranteed 
global error control. As measures of global error we consider both family-wise error rate 
(FWER), to be controlled by the Bonferroni procedure, and false discovery rate (FDR) 
(Benjamini and Hochberg, 1995), to be controlled by the BH procedure. For simplicity, we 
concentrate on positive signals and one-sided tests, but this is not crucial. 
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Our proofs of error control assume that the noise is a continuous ergodic stationary 
Gaussian process. This assumption permits a closed form formula for computing the p- 
values corresponding to local maxima of the observed process. The distribution of the 
height of a local maximum of a Gaussian process is not Gaussian but has a heavier tail, and 
its computation requires careful conditioning based on the calculus of Palm probabilities 
(Cramer and Leadbetter, 1967; Adler et al., 2010). This is crucial to ensure that p- values 
are valid. 

Another interesting and challenging aspect of the proof of error control is the fact that 
the number of tests, which is equal to the number of local maxima in a given interval, 
is a random quantity. Proofs of error control in the multiple testing literature usually 
assume that the number of tests is fixed. In our proofs, we overcome this difficulty using 
an asymptotic argument for large search space, so that the error behaves approximately as 
it would if the number of tests were equal to its expected value. 

In the proofs, the asymptotics for large search space are combined with asymptotics for 
large SNR. The large SNR assumption helps solve the issue of identifiability of peaks, as 
it implies that each signal peak is represented by only one observed local maximum with 
probability tending to one. The asymptotic rates, however, are such that the search space 
is allowed to grow much faster than the SNR; exponentially faster. In this sense, we do not 
consider the large SNR assumption restrictive. 

For concreteness, we conduct simulations assuming a Gaussian peak shape and a Gaus- 
sian autocorrelation function. Our simulations confirm that moderate values of SNR are 
enough to provide desired error levels. Our simulations also show that the performance is 
maintained under a substantial amount of overlap between neighboring peaks. 

Defining detection power as the expected fraction of true peaks detected, we prove 
that the peak detection algorithm is consistent in the sense that its power tends to one 
under the above asymptotic conditions. We then use simulations to address the question 
of optimal bandwidth. In the above Gaussian autocorrelation model, we find that the 
detection power is maximized when the smoothing bandwidth is close to the bandwidth of 
the signal peaks, slightly larger when the noise is uncorrelated and getting smaller as the 
amount of autocorrelation in the noise increases. This result is similar to the well-known 
matched filter theorem in signal processing, which states that the SNR after smoothing is 
maximized when the smoothing kernel matches the shape and width of the signal (Simon, 
1995; Pratt, 1991). Most noticeably, the optimal bandwidth for peak detection as multiple 
testing is different, in fact much larger, than the usual optimal bandwidth for nonparametric 
estimation. A distinction between our setting and the matched filter theorem is that in many 
applications of the latter in digital communications and radar screening, the time of testing 
is prespecified, while in our case it is random. 

We illustrate our procedure with a data set of neural electrical recordings, where the 
objective is to detect action potentials representing cell activity (Baccus and Meister, 2002). 
The data analysis takes advantage of the sparsity of the signal in order to estimate the 
parameters of the noise process. 

The rest of the paper is organized as follows. Section 2 presents the theoretical results. 
Section 3 presents the simulation results. Section 4 presents the data example. Section 5 
summarizes. Proofs are given in Section 6. All the simulations and the data analysis were 
implemented in R. 
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2 Theory 



2.1 The model 

Consider the signal-plus-noise model 

y(t)=n(t) + z(t), teR (1) 

where fx{t) is the signal we wish to detect and z(t) € C 2 is stationary ergodic zero-mean 
Gaussian noise. The signal /j,(t) is a (sparse) train of positive peaks of the form 

Ai(t) = Y, a Mt ~ Tj), h b (t) = -h ( - J (2) 

j=—00 

where aj,b > 0. The peak shape h(t) > is assumed unimodal with mode at t = and 
no other critical points within its support S = {t : h(t) > 0}, where S is assumed to be 
compact and connected. Assume that h(t) has unit action f^htydt = 1. Let 

1 ft\ 

w~{t) = —w — , 7 > 

7 \7/ 

be a unimodal kernel with compact and connected support and w^(t) dt = 1. Define 

h y (t) = w y (t) * h h (t) (3) 

with support Sj = {t : /i 7 (t) > 0}. We assume that h^{t) is unimodal with mode at some 
interior point of S not necessarily equal to 0, with no other critical points. In addition we 
assume that /i 7 (i) is twice differentiable in the interior of S 7 . Note that if, for example, 
both h(t) and w(t) are truncated Gaussian functions, then all the assumptions made on 
hy(t) above are valid. 

Finally, the smoothed signal and smoothed noise are defined as 

oo 

jU-y(t) = Wj(t) * fl(t) = Y^ CLjhy(t — Tj), Zy(t) = Wy(t) * z(t) . (4) 

j=-oo 

We require that the supports Sj i7 = {t : hj(t — Tj) > 0} do not overlap for all j. In this 
sense the signal can be considered sparse. 



2.2 The procedure 

Suppose we observe y(t) in the segment T = [— L/2, L/2], which contains Jl peaks. The 
objective is to identify a set of locations f\, . . . , that are close in location and in number 
to the true set of peak locations t\,. . . ,tj l , while controlling the probability of obtaining 
false peaks. Consider the following procedure. 

Procedure 2.1. 

1. Kernel smoothing: Construct the process 

/oo 
Wry(t — s)y(s) ds, (5) 
-oo 

where we ignore boundary effects at ±L/2. 
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2. Candidate peaks: Find all the local maxima of x 7 (t) in [— L/2, L/2], i.e. find the set 



L L 
2"' 2" 



£ 7 (i) 



cfc 7 (t) 
dt 



0, x 7 (i) 



d 2 x^(t) 
dt 2 



< 



(6) 



3. P-values: For each t 6 T calculate the p-value for testing the hypothesis 

Ho(t) : fi(t) = vs. n A {t) : fi(t) > 



4. Multiple testing: Apply a multiple testing procedure on the set of p-values and declare 
significant all peaks whose p-values are smaller than the corresponding threshold, 
which may depend on 7 and L. 

The idea behind Procedure 2.1 is very simple: smooth, find local maxima, feed the 
list of local maxima into any standard multiple testing procedure. These steps are easy 
to remember and implement. The first two steps are straightforward. The last two need 
careful study, which we do next. 

Step 3 is detailed in Section 2.3 below. For Step 4, we use the Bonferroni procedure 
to control FWER and the BH procedure to control FDR. Let rh be the number of local 
maxima in T, i.e. the number of tested hypotheses. To apply the Bonferroni procedure at 
level a we compare each p-value to the threshold a/m and reject all the hypotheses whose 
p-values are below the threshold. To apply the BH procedure at level a we first order the 
p-values and then compare the ordered i-ih p-value with ia/m. Defining k as the maximal 
index for which the ordered p-value is smaller than the corresponding threshold, we reject 
k hypotheses with the k smallest p-values. More details are given in the Sections below. 



2.3 P-values 

A crucial step in implementing any marginal multiple testing procedure is calculating p- 
values. P-values are always computed under the complete null hypothesis of no signal 
anywhere. For Step 3 of Procedure 2.1, we proceed as follows. 

Definition 2.2. Assume the model of Section 2.1 with fi(t) = 0, Vt, so that x 7 (i) = z 7 (i), 
given by (4). Let F 7 (u) denote the right cumulative distribution function (cdf) of z 7 (t) at 
the local maxima t G T, 

F 7 (u) = p{^ 7 (t) > u tety (7) 

Then the p-value of the observed x 1 (t) at t G T is 

p 7 (t) = F y [x y (t)], tef. (8) 

The probability in (7) follows the Palm distribution for maxima and we use its properties 
to compute an explicit formula for p-values. One needs to be careful while computing this 
probability to avoid bias because of two reasons. First, although we use the usual symbol 
for conditioning, this is not the usual conditioning event. Computing this probability by 
usual conditioning gives the Gaussian distribution. However, computing the probabilities 
just at local maxima based on the Gaussian distribution will lead to the biased down p- 
values. Second, we are conditioning on an event of probability zero. See Adler et al. (2010, 
Ch. 6) for a more detailed discussion. The formula for evaluating the required distribution 
(7) is given by Proposition 2.3 below. 
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Proposition 2.3. Assume the model of Section 2.1 and the procedure of Section 2.2. Under 
the complete null hypothesis (i(t) = 0, Vi, define the moments 



a 1 = var[z 7 (i)], A 2 , 7 = var[i 7 (i)], A 4i7 = var[z 7 (i)]. (9) 



Then, 



where 




F» = 1 - $ W^I + J -J** - * UW , (10) 



A = ^A 4 , 7 - A| 7 (11) 
and (j)(x), Q(x), are the standard normal density and cdf, respectively. 

This result was proven by Cramer and Leadbetter (1967, Ch. 10), using the well known 
Kac-Rice formula (Rice, 1945), (Adler and Taylor, 2007, Ch. 11). The proof, which we 
omit here, is based on other intermediate results that we will also need later, and so we 
state them in the next lemma. 

Lemma 2.4. Let z(t) G C 2 be an ergodic stationary zero-mean Gaussian process with 
spectral moments a 2 = var[z(i)], A 2 = var[i(i)] and A4 = var[£(i)] ; defined on a compact set 
Tel with non-empty interior and finite measure \T\. 

1. Define respectively the number of local maxima in T and the number of local maxima 
in T that cross the threshold u as 

m(-oo; T) = # {t € T : z{t) = 0, z(t) < 0} 

m(u; T) = #{i£T: z(t) > u, z{t) = 0, z(t) < 0} . 



Then 



E[m(-oo; T)] = |T|E[m(-oo; [0, 1])], 
E[m(u;T)] = \T\E[rh(u; [0, 1])], 



where E[m(— 00; [0, 1])] = A/A4/A2 / ^(2vr) is the expected number of local maxima and 
E[m(«; [0, 1])] is the expected number of local maxima above the threshold u in [0, 1], 
with expression given in Cramer and Leadbetter (1967, Ch. 10). 

2. The right cdf of the heights of the local maxima of z{t) is given by 

P { , w >.| iW -o, iW <»}-^m r . (i 2 ) 

Proposition 2.3 follows directly from Lemma 2.4 applied to the process z 7 (i). In par- 
ticular, (10) follows directly from evaluating (12). Referring to (10), note that, for large 
enough u, 

I2ttX 2 ^ f u 



f > )k Va^*UJ' (13) 

Therefore, the tail of the distribution is proportional to a Gaussian density, so it behaves 
like a Rayleigh distribution. 

A peculiar characteristic of Procedure 2.1 is that the number of tests m, which is the 
same as the number of p- values, is random. In particular, under the complete null hypothesis 
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fi(t) = 0, Vt, the expected number of tests can be computed explicitly applying Lemma 2.4 
to the process z 7 (t) as 

E[ro(-oo; [-L/2, L/2])] = ^J^- (14) 

The quantities a 2 , A2 i7 , and A4 i7 in Proposition 2.3 depend on the kernel Wj(t) and 
the autocorrelation of the original noise process z{t). A specific form of the p- values and 
the expected number of tests may be obtained for the following Gaussian autocorrelation 
model. 

Example 2.5 (Gaussian autocorrelation model). Assume the complete null hypothe- 
sis. Suppose w(t) = cj)(t)l[— c, c] is a truncated normal density and assume that the auto- 
covariance function of the noise process z(t) is proportional to a normal density, i.e. 

z{t)=aj w„(s-t)dB(8), w u {t) = ^-<p(^j 

where B(s) is standard Brownian motion and v > 0. Ignoring the truncation at ±c, the 
required moments in (10) or (14) are 



a 2 . a 2 3a 



-<-2^e A2 ' 7 = w A4,7 = v^' ? = ^+^- (15) 

Derivations are given in Section 6.1. Therefore, for the Gaussian autocorrelation model, 

E[m(-oo;[-L/2,L/2])] = 

If w (t) is truncated at ±c where c is large enough, say c = 4, the moments in (15) are a 
good approximation for the same moments computed in the truncated version. 



2.4 Error rates definitions 

In this subsection we define the two error rates to control, FWER and FDR. Let us define 
first what is considered a false discovery and what is considered a true discovery under the 
true model. 

Definition 2.6. Let Sj = {t : hb(t — Tj) > 0} be the support of h^t — Tj). Define the signal 
region Si and null region So respectively by 

Si = (J Sj and S = [-L/2, L/2] \[{jsA. 

j=i y=i / 

It is known that kernel smoothing expands the signal support. This effect requires care 
as it increases the probability of obtaining false positives in the regions neighboring the 
signal (Pacifico et al., 2004b). The larger the bandwidth 7, the greater the distortion. In 
particular, the support defined with respect to u> 7 is larger than Sj, and we define it formally 
next. 
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True signal 




Figure 1: True and smoothed signal. 



Definition 2.7. Fix 7 > and define <Sj i7 = {t : h^it — Tj) > 0} where h^{t) is given by 
(3). Define the expanded signal region Si j7 and reduced null region §0,7 respectively by 

§1,7 = U %7 and S o, 7 = [-L/2, L/2] \ I (J S j7 7 ] . 

The above definitions are illustrated schematically in Figure 1. Some useful equalities 
are: (1) §0 U §1 = §0,7 U §1,7 an d (2) §1 7 \ Si = §0 \ §0,7- We refer to the latter set, 
the difference between the expanded signal support and the true signal support, as the 
transition region. 

We follow the classical view of error definitions. A significant local maximum that 
belongs to §0 is considered an error. A significant local maximum that belongs to §1 is 
considered correct. Moreover, if more than one local maximum occur within the same peak, 
all significant local maxima are considered as correct. However, all these local maxima are 
counted as one peak for the purposes of power, defined formally below in Section 2.8. Thus 
power is not inflated. Nevertheless, this situation does not affect the theoretical results 
because under our asymptotic assumptions (Theorem 2.12) each peak is represented by one 
local maximum with probability tending to 1. 

To simplify the notation, we define the number of falsely rejected local maxima using 
the threshold u as: 

V(u) = #{tefr\S : Xj(t) > u} and V^u) = #{i G f D §0,7 : aJ 7 (t) > u}, 

where V(u) counts false discoveries in the full null region and V^(u) counts them only in 
the null region minus the transition region. Similarly, we define the number of correctly 
rejected hypotheses using the threshold u as 

W{u) = #{t G f n Si : Xj(t) > u} and W 7 («) = #{t G f D §i, 7 : x 7 (i) > «}, 

where W(u) counts discoveries in the true signal region and W 7 (it) counts them in the signal 
region plus the transition region. The total number of rejections using threshold u is 

R{u) = V{u) + W{u) = #{t G f : x 7 (t) > u}. 
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The number of tests where the null hypothesis is true is denoted by 

m = V(-oo) = #{t G T n So}) 
and the total number of tests is 

m = R(-oo) = #{t G T}. 
The number of tests where the null hypothesis is false is denoted by mi = rh — rho- 

Definition 2.8. Define the family-wise error rate (FWER) for any threshold u as the 
probability that there exists at least one local maximum of x 7 (i) in the null region above 
the threshold u: 

FWER(w) = P{V(u) > l} = p|TnS o /0and max z 7 (t) > u\ . (17) 

Definition 2.9. Define the false discovery rate (FDR) as the expected proportion of falsely 
rejected hypotheses using a fixed threshold u: 

FDR ("» = E {flivT}- (18) 

Note that if f is empty then rh = and V(u) = and so FWER(u) = FDR(u) = 0. 
However, the probability of this event goes to zero as L increases. 



2.5 Weak control of FWER 

In this subsection we assume the complete null hypothesis /i(t) = 0, V£ G [-L/2, L/2]. Note 
that under this assumption rh = rho. 

In Procedure 2.1, after Step 2 one has a list of rh local maxima. The Bonferroni pro- 
cedure at level a defines a threshold of the form a/fh and rejects all hypotheses whose 
p-values are below this threshold. Note that in the usual Bonferroni procedure the number 
of tested hypotheses is constant. In our case the number of tested hypotheses is random, 
therefore we consider two versions. We refer to the version in part (2) of Theorem 2.10 as 
the Bonferroni procedure, while the version is part (1) can be viewed as the 'limit' of the 
Bonferroni procedure. 

Theorem 2.10. Assume the model of Section 2.1 and the procedure of Section 2.2. Let 
H(t) = 0, Vt. Fix a > 0. 

1. Suppose the null hypothesis Ho(t) is rejected at t G T if 

^ ^® > ^=^ 1 {m)- (19) 

Then FWER(n* on ) < a. 

2. Suppose the null hypothesis ~Ho(t) is rejected at t G T by a Bonferroni procedure with 
rh tests, i.e. if 

Py{t) < ° ^ X 7 (t) > UBon = F" 1 (") . (20) 

If rh = we define a/fh as infinity. Then limsup^^.^ FWER(nBon) < ex. 
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The proof of Theorem 2.10 is given in Section 6.2. The first part of the theorem is 
not asymptotic and its proof is a direct consequence of the definition of p- values. The 
threshold « Bon in (19) is deterministic. For example, in the Gaussian autocorrelation model 
this threshold can be computed by substituting (16). The threshold «B on in (20) depends 
on the random quantity m and is equivalent to applying the Bonferroni procedure on the 
random set of local maxima, T. The proof of the second part is based on the fact that by 
the weak law of large numbers m is close to its expectation E[m] for large enough L. 

Using (13), the thresholds in Theorem 2.10 may be approximated by 



u 



Bon 



UBon ~ 1 




\ 



21 ° S 



2,7 



\ 



2 log 



a 



Notice that asymptotically both thresholds have a similar form to the universal threshold 
of Donoho and Johnstone (1994, 1995). 



2.6 Strong control of FWER 

Let us start with brief reminder about the assumptions of the true model. We observe y(t) 
as defined in (1) in the segment T = [— L/2, L/2], where the signal contains Jl peaks at 
locations Tj £ Sj for 1 < j < Jl, where Sj is the finite support of the jth peak. Recall that 
/i 7 (i) = u> 7 (t) * h(t) is the peak shape after smoothing. It achieves its supremum at a single 
point Tj a G Sj which is not necessarily the same as Tj and has no other critical points. We 
define the signal after smoothing as 

Jl 

3=1 

The next theorem describes the strong control of FWER for this model. 

Theorem 2.11. Assume the model of Section 2.1 and the procedure of Section 2.2. Fix 
a > 0. 

1. Suppose the null hypothesis Ho(t) is rejected at t £ T if 

^ (4)< eM ~ *-<*> > = F " (ipj) ' (22 » 

Then, for all L, lim sup FWER (ug on ) < a, as aj — > oo, Vj. 

2. Suppose the null hypothesis Unit) is rejected att£Tby the Bonferroni procedure with 
m tests, i.e. if 

p 7 {t) < " ^=> x 7 (t) > neon = F- 1 (®) . (23) 

If fh = we define a/rh as infinity. Then limsupFWER(uBon) < ot, as L — > oo and 
aj — > oo, Vj, such that for all j, Laj(p(Kaj) — > for any constant K . 
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Notice that the result in the first part of Theorem 2.11 is asymptotic, in contrast to 
the result in the first part of Theorem 2.10. In addition, u Bon defined in (22) cannot be 
computed, even assuming the Gaussian autocorrelation model, since the true signal region 
is unknown. In practice we recommend to apply the Bonferroni procedure defined in the 
second part of Theorem 2.11, which can always be implemented and provides asymptotic 
strong control of FWER. The rates in Theorem 2.11 part (2) may be achieved, for instance, 
if L increases exponentially with a,j . 

The proof of Theorem 2.11 is given in Section 6.3. There we show first that the expected 
number of local maxima in the transition region Si i7 \§i converges to (Lemma 6.3). Then 
we use arguments similar to those in the proof of Theorem 2.10. 

2.7 Control of FDR 

In many applied problems the expected ratio of false discoveries is a more appropriate error 
rate to control. The next theorem states that applying the BH procedure on the random 
set of local maxima controls the FDR asymptotically. 

Theorem 2.12. Assume the model of Section 2.1 and the procedure of Section 2.2. Let 
Xj\t) be the ith ordered value off in ascending order. Let 

«BH = u k , k= mm {i : xty (t) > Ui} (24) 

l<i<m ' 

be the threshold obtained by applying the BH procedure at the level a on the random set T 
with m / tests, where 

„_i f fh — i + \ \ 

Ui = F„ a , 1 < i < m 

' \ m J 

are the constants corresponding to the BH procedure. Reject m — k + 1 hypotheses corre- 
sponding to Xj(t) > i2bh- If such k does not exist or m = reject nothing. Assume that for 
L — > oo and aj — > oo for all j, 

1. the number of peaks Jl increases at the same rate as L or slower, i,e. Jl/L — > A\, 
where < A\ < 1. 

2. L — > oo and aj — > oo, Vj such that Vj, L(f>(Kaj) — > for all 5 > and any constant 
K. 

Then, 

limsupFDR(£tBH) ^ a - 

The rates of this theorem may be achieved if L grows exponentially with aj. The 
proof is given in Section 6.4. First we show that: (1) a local maximum exists in a small 
neighbourhood of a true peak mode with probability tending to 1; (2) this local maximum 
is rejected for any fixed threshold with probability tending to 1 (Lemma 6.4). The proof of 
the theorem is then based on the following arguments. It is known that the threshold of the 
BH procedure can be viewed as the largest solution of the equation aG(u) = Fj(u), where 
G(u) is the empirical right cumulative distribution function of x 7 (i), t G T (Genovese et al., 
2002). The ergodic assumption guarantees that G(u) has a limit. Replacing G(u) by its 
limit we find that the asymptotic solution it BH satisfies 

F>&H) = A 1+ EM0, 1]](1- a)' (25) 
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We show that under the conditions of the theorem, the threshold Ug H asymptotically con- 
trols the FDR below the desired level a. Since F^(ubii) —> ^-(^bh)' the FDR level will be 
asymptotically controlled as well when using ubh instead of w BH . 

Notice that, in contrast to the Bonferroni procedure, where the deterministic threshold 
u Bon m (22) grows unbounded with increasing L, the asymptotic threshold for the BH 
procedure u% H in (25) is finite, and depends on the asymptotically fixed proportion of false 
null hypotheses, A\. 

2.8 Power 

We define the statistical power of Procedure 2.1 as the expected fraction of true discovered 
peaks: 



We discuss power in two ways. First, we show that the Bonferroni and BH procedures 
in (23) and (24) respectively are consistent in the sense that the power tends to 1 for any 
fixed value of the smoothing parameter 7. Second, we discuss what value of the smoothing 
parameter 7 maximizes the power for finite samples. 

2.8.1 Power of the Bonferroni and BH procedures 

The consistency of both procedures is given in the next theorem. 
Theorem 2.13. 

1. Under the conditions of Theorem 2.11 the power of the Bonferroni procedure (23) 
converges in probability to 1. 

2. Under the conditions of Theorem 2.12 the power of the BH procedure (24) converges 
in probability to 1. 

The proof of Theorem 2.13 is based on the next lemma. 

Lemma 2.14. 

1. Under the conditions of Theorem 2.11, Wg on /[aj/i T (0)] — > in probability. 

2. Under the conditions of Theorem 2.12, u^ u /[ajh 7 (0)] — > in probability. 

As L and aj, j = 1, . . . ,m, go to infinity, the deterministic Bonferroni threshold (22) 
goes to infinity as well because E[m] goes to infinity. Lemma 2.14 states that Oj7i T (0) goes 
to infinity faster than Ug on does. In contrast, the asymptotic BH threshold (25) does not 
depend on L but only on the asymptotically fixed proportion of false null hypotheses, A±. 
Therefore it does not increase with L but is asymptotically constant. 

Note that the statements in Lemma 2.14 are about deterministic thresholds, while the 
statements of Theorem 2.13 are about random thresholds. As in the proofs of the previous 




(26) 



= P < T n Sj ^ and max x 7 (t) > u 
\ tefnSj 
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theorems, we use the fact that the gap between the deterministic and random thresholds 
for both the Bonferroni and BH procedures goes to 0. 

It is known that in general, if there exists a signal anywhere, the power of the BH 
procedure is larger then the power of Bonferroni procedure (Benjamini and Hochberg, 1995). 
This is also true in our case with respect to Ug on and To see this, note that for any 

fixed and large enough L the thresholds can be approximated by 

*V(«BoJ = EpiJ ~ m li7 + E[m , 7 ] ^ 

and 

p (u * \ „ aifl hi (28) 

7V bh; Ai+E[f?io )7 ;[0,l]](l-a) ~ mi, 7 + (l-a)E[mo i7 ]" k ; 

It immediately follows that if rh\ n > 1, the threshold Ug on is larger than the threshold itg H , 
promising a larger power for the BH procedure. 

2.8.2 Optimal choice of 7 

Here we discuss the best choice of 7, i.e. the value of 7 that maximizes the power (26). 
To maximize the probability of local maxima to exceed a given threshold u under the true 
model is a difficult problem. Therefore, we turn to less formal discussion in this section. 

Lemma 6.4 in Section 6.4 shows that for any single true peak with mode at t = 0, a 
local maximum exists in a small neighbourhood Ij e of t = with probability tending to 1. 
The power may be approximated as 

Power(it) = P < f fl Sj / and max x 7 (t) > u > 



= P<Tn/j £ ^0 and max x y (t) > u > 
{ tefni^ J 

+ P { f n (Sj \ Ij e ) ^ and max x y (t) > u \ 
{ tefniS^e) J 

«P<Tn/j £ ^0 and max x 7 (t) > it > 
[ ' tefni^ J 

« P < max x~(t) > u I a local maximum exists in e > . 

Therefore, heuristically, the optimal value of 7 should be close to the value of 7 that maxi- 
mizes P(x 7 (0) > u). This value of 7 is approximately given by: 

/ a,7i 7 (0) — it\ / /i 7 (0) m 

argmax P(x 7 (0) > it) = argmax <E> I — — L — ] = argmax 1 



hy (0) f^w^htWds (29) 
« argmax — 1 = argmax -j^^^=^^^= , 

where a and 07 are the standard deviations of the observed and smoothed processes y(t) 
and Xy(t), respectively. The approximation may be justified for both procedures by Lemma 
2.14. In the last expression in (29), the optimal value of 7 is that which makes w 7 (t) closest 
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to hb(t) in an I? sense. In particular, if w(t) = h(t), then the optimal value of 7 is 6. This 
result is similar to the well-known matched filter theorem for detecting a single signal peak 
of known shape at a fixed time t. 

Example 2.15 (Gaussian autocorrelation model). Suppose h(t) = w(t) = <p(t), the 
standard normal density, and assume that the autocovariance function of the noise process 
z(t) is proportional to a normal density as in Example 2.5. In this case 



We show in the simulations below that the optimal 7 is indeed close to the one obtained by 
eq. (30). 

3 Simulation Studies 

3.1 Performance for finite space and SNR 

A simulation study was carried out to investigate the FDR and FWER levels as well as 
the power of the proposed algorithm based on the Bonferroni and the BH procedures for a 
finite range L and for a finite SNR. Our simulation study is based on the model described 
in Section 2.1 and Example 2.5, with 20 equally spaced and non overlapping peaks of the 
same height. The peak shape, h(t), is the normal density with standard deviation 6 = 3, 
truncated at c = ±26. Peaks were centered at Tj = lOOj — 50, for j = 1, . . . , 20 and aj in 
(2) for all j is chosen to be 10 and 15 in two different scenarios, giving a moderate and a 
strong SNR respectively. The noise is a stationary zero-mean Gaussian process as defined 
in Example 2.5 with a = 1 and v = 0, 1 and 2. We sampled the time axis at L = 2000 
equally spaced points. Figure 2 presents a fragment of the simulated data. Note that the 
height of the peaks is aj/\f2nb 2 . 

Following Procedure 2.2, we constructed the process x 7 (t) = y(t) * w^(t), where u> 7 (i) 
is a Gaussian kernel and 7 was equally spaced between 1 to 6.5 in steps of 0.5. For each 
configuration we computed the power and the error level based on 10,000 replications. 
The FWER level was computed as the proportion of replications for which at least one 
false discovery occurred. The FDR level was computed as the mean proportion of false 
discoveries out of the total number of discoveries. The power is presented as the mean 
number of truly discovered peaks out of the 20 peaks. 

Figure 3 presents the FWER and FDR levels of the Bonferroni and the BH procedures 
for all the studied configurations. The nominal level for both procedures was set to 0.05. 
For relatively large values of 7 the Bonferroni procedure may exceed the prespecified error 
level. This happens due to the broadening of the signal: many of the rejected local maxima 
correspond to real peaks, but due to large value of 7, the modes are shifted away from the 
true mode locations to the transition region, and the discoveries are no longer considered 
as correct. FDR is less affected because FDR is an expected ratio, so larger 7 also results 
in fewer discoveries in the transition region. The effect of the broadening of the signal is 
less for both procedures for larger SNR. Note that for any finite SNR one can choose large 
enough 7, so the phenomenon of the broadening of the signal will lead to exceedance of the 
error rate. Recall, however, that asymptotically there should be no local maxima in the 
transition region. There is no strong dependency on SNR and on the noise autocorrelation 
parameter v. 




v < b/y/2 
v > b/V2 



(30) 
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Figure 2: A fragment of the simulated data. 



Figure 4 presents the power of the Bonferroni and the BH procedures, respectively, for all 
studied configurations. As expected, (1) the power of the BH procedure is greater than that 
of the Bonferroni procedure for all studied configurations; (2) the power of both procedures 
is greater for larger SNR. For large SNR the power of the BH procedure is almost a flat 
function of 7. This means that for large SNR the selection of 7 is not very important as 
long as it is near the signal width b. 

Figure 4 shows that there is one value of 7 in each configuration for which the power is 
maximized. The optimal value is usually around 6 = 3, but it depends on the parameter v. 
The larger u, the smaller the value of the optimal 7 as expected from (30). Note that the 
error level is always controlled if one selects the 7 value close to the optimal one. To get 
more precise results about the optimal value of 7 we performed an additional simulation 
study where 7 takes values in the range 1 to 3.5 in steps of 0.1. The empirical optimal 
values of 7, as well as the optimal 7 from equation (30), are summarized in Table 1. 

As explained in Section 2.7, each peak is represented by no more than one local maximum 
with probability tending to 1 as the SNR increases. In our simulation we computed the 
probability of obtaining more than one local maximum within the same peak in the finite 
setting. Figure 5 shows that this probability is a decreasing function of 7, and a decreasing 
function of the SNR. Note that near the optimal value of 7 this probability is negligible. 
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Figure 3: FDR level of the BH procedure (solid), FWER level of the Bonferroni procedure 
(dashed). 



3.2 Overlapping peaks 

In the second part of our simulation we test the performance of our algorithm for a non- 
sparse signal. In the previous simulation, the distance between two adjacent peak locations 
Tj, which we denote by D, was set to 100. We now test how the distance between the peaks 
D affects the power and the error level of both procedures. 

For the case where aj = 10, v = and 7 = 3.2 (which is optimal for the Bonferroni 
procedure) and 7 = 2.0, we reduce D and L while keeping fixed the proportion between 
the signal region, §1 and the total tested region, S. The first row in Figure 6 presents the 
error levels of the Bonferroni and BH procedures for different values of D. Recall that the 
support of each peak has a length of 13 (r ± 26), so peaks start to overlap for D < 46 = 12. 
It can be seen that the error levels of both procedures are controlled regardless of the value 
of D. 

Overlapping of peaks requires care of the definition of power. In this section the power 
was defined as the number of truly rejected peaks, where 'truly' means that the rejected local 
maximum belongs to the peak support. Since any rejected local maximum must represent 
just one peak, we divide the overlapping part of the support to two equal parts and add the 
left half to the rejection region of the peak from the left and the right half to the rejection 
region of the peak from the right. Thus, for any single peak the rejection region consists of 
the non-overlapping part of the support (in the middle of the peak) and half of each one of 
the overlapping parts. This makes the rejection region smaller than the original support. 

The power of both procedures almost does not change when D is reduced until there 
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Figure 4: Power of the BH (solid) and Bonferroni (dashed) procedures. 
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equation (30) 


3.0 


2.8 


1.0 


3.0 


2.8 


1.0 


Bon 


3.3 


2.8 


1.3 


2.9 


2.9 


1.3 


BH 


3.2 


3.1 


1.3 


3.4 


3.0 


1.2 



Table 1: The 'optimal' value of 7. 



is overlap in the supports; see the second row in Figure 6. Because peaks in overlapping 
positions are combined, the number of local maxima in the Si region decreases. However, 
these local maxima become more significant and are usually rejected by both the Bonferroni 
and the BH procedures. In the configuration presented here for 7 = 3.2, for moderate 
overlapping (around 50%) 20 peaks are represented on average by 8 local maxima, while 
for non-overlapping supports in the same configuration 20 peaks are presented by 19.5 local 
maxima. In the latter case the mean number of rejections by the Bonferroni procedure 
is around 5 and the mean number of rejections by the BH procedure is around 11. This 
explains why for moderately overlapping supports the power of the BH procedure decreases 
and the power of the Bonferroni procedure increases. When peaks overlap by more than 
50% the power of both procedures decreases. For 7 = 2 the behaviour of power is the 
same. The power is relatively low before overlapping; therefore it increases for moderate 
overlapping and falls down for large overlapping. The last panel in Figure 6 shows that 
when peaks are close or have little overlap, small values of 7 perform better. 



17 



'nu'=0 




Gamma 

Figure 5: Probability of obtaining more than one local maximum within the same true peak 
for moderate SNR (a = 10). 

4 Data Example 

4.1 Data description and analysis 

Our data consists of 60 seconds of recordings from an electrode attempting to capture the 
activity of a single type of neuron in a salamander brain. The recorded signal was digitized at 
a sampling frequency of 10 KHz, resulting in a sequence of 600,000 measurements equally 
spaced over time. Data of these kind and over much longer time periods are routinely 
collected in neuroscience experiments (Baccus and Meister, 2002). 

This is a situation where the model of Section 2.1 can be applied directly. Since all peaks 
of interest correspond to action potentials from the same type of cell, it is reasonable to 
assume that they all have the same shape. Their intensities, however, vary according to the 
distances of the cells to the recording electrode. The depolarization (positive) component 
of the action potential is unimodal. The noise is a mixture of electrical noise and recording 
of remote cells, exhibiting roughly a large-scale stationary behaviour. Favourably, the SNR 
is high. For illustration, the (smoothed) data is depicted in Figure 7. 

Procedure 2.1 was implemented as follows. For Step 1, we used a Gaussian kernel as in 
Example 2.5. From the results of Section 3, the choice of kernel width 7 is not crucial but it 
is better if it roughly matches the width of the signal peaks. Since all peaks are assumed to 
have the same shape, we selected a few 'obvious' peaks and estimated their width. Figure 8 
shows that a scaled normal density with standard deviation of 1.5 approximates the peaks 
shape reasonably well. 
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Figure 6: Effect of distance between peaks D on the BH procedure (solid) and the Bonferroni 
procedure (dashed). 



In Step 2, rh = 54, 452 local maxima were found. For the heights of these local maxima, 
the corresponding p- values in Step 3 were computed according to formula (16), plugging in 
estimates of the spectral moment parameters a 2 , A2 and A4 as described below. 

For Step 4, we applied the BH procedure with rh = 54,452 and level q = 0.01, leading 
to 464 rejections of the null hypothesis. Figure 7 shows the smoothed data and the BH 
threshold. While many of the peaks could have been found by simple inspection, thanks 
to the high SNR, the algorithm detected many other weaker peaks. Figure 7(b) shows a 
zoomed segment of the data, in which two quite different peaks were detected, one strong 
and one weak. 

4.2 Estimation of spectral moments 

Here we describe how to estimate the noise parameters a 2 , A 2 and A4 using a simple method 
motivated by (9). It is known that the variance of an ergodic process can be consistently 
estimated by the sample variance. This suggests estimating a 2 , A2 and A4 respectively 
by the sample variance of the process, the difference process (as an approximation to the 
derivative), and the difference squared process (as an approximation to the second deriva- 
tive). However, the smoothed observed process x 7 (i) is expected to contain some signal 
and not just noise, biasing the sample variance estimators. To reduce the bias, we replace 
the standard sample variance by a nonparametric estimator that is less sensitive to extreme 
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Figure 7: (a) The neural spike data after smoothing with a Gaussian kernel of standard 
deviation 1.5. (b) Zoom in. In both panels, the stars indicate the detected peaks according 
to the BH procedure. The dashed line is the BH threshold for rejecting the null hypothesis. 




Figure 8: One 'obvious' peak (solid) superimposed on a scaled normal density with standard 
deviation 1.5 (dashed). 
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values, specifically 

a 2 = MAD 2 [x y {t)] = Med 2 {\x 7 {U) - Med [x 7 (t)]|} , 

A 2 = MAD 2 [Az 7 (t)] , (31) 

A 4 = MAD 2 [A 2 x 7 (t)] , 

where Ax(t) for any discretely sampled vector = is the sequence of differences 

{[x(ti + i) — x(ti)]/ (ti+i — ti)}f~^ . These estimators are expected to perform well if the signal 
is sparse in the sense that it occupies only a small portion of the data. 

To evaluate the accuracy of the proposed estimators, we performed a brief simulation in 
which we compared their performance with the performance of three additional estimation 
methods. The first method estimates the spectral moments by standard sample variances. 
The second method is based on the fact that for a stationary zero-mean Gaussian process 
z(t) G C 2 with autocovariance function (ACF) c(s) = E[z(t)z(t + s)], the spectral moments 
satisfy a 2 = c(0), A2 = — c(0) and A4 = c^(0). If c(s) is the empirical ACF of the noise 
process, we fit a polynomial regression c(s) = /3q + fas 2 + ^s 4 + e in the neighbourhood of 
s = 0, where even powers of s suffice because the ACF is symmetric around zero. Computing 
the derivatives of the polynomial at s = leads to the estimators a 2 = $0, A2 = —2/32 and 
A4 = 24/34. The third method estimates a 2 as in (31) and A2 by the 'crossing' estimator 
suggested by Lindgren (1974), 

2tt 1 / u 2 u 2 \ 

X2 = &2 T3 [ No{T) + exp Y Nu{T) + exp ~2 N - u{T) ) ' 

where u = 2a /3 and N U (T) is the number of upcrossings of the process x 7 (t) of the level 
u in [0, T\. The estimation of A4 is done in the same way from the process of differences 
Ax 7 (t). 

In the simulation, two discrete sequences of length 10,000 were generated. The first one 
contained only white Gaussian noise and the second one contained white Gaussian noise plus 
17 equally shaped peaks with heights all equal to 2, corresponding to high SNR. Mimicking 
the data, both sequences were smoothed with a Gaussian kernel of standard deviation 1.5. 
The exact moments a 2 , A2 and A4 were computed via (15). The estimated moments by all 
the methods were averaged over 2000 replications. Table 2 summarizes the results. 

Returning to the data, after subtracting the overall mean of 0.0537, the obtained es- 
timates via (31) were a 2 = 0.0587, A 2 = 0.0019 and A 4 = 0.0006. P-values were then 
computed for the mean-subtracted data via (10) using these estimated moments, leading 
to the results described earlier in Section 4.1. 

5 Discussion 

In this paper, we have used the heights of local maxima after smoothing as test statistics 
for identifying unimodal peaks in the presence of Gaussian stationary noise. It was shown 
that the procedure provides strong control of FWER and FDR asymptotically as both the 
SNR and length of the sequence tend to infinity, with the length of the sequence allowed to 
grow exponentially faster than the SNR. Simulations showed that the algorithm is powerful 
and that a matched filter principle applies where the optimal smoothing bandwidth is close 
to the width of the peaks to be detected. 
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Noise only 





Formula (15) 


MAD 


Var 


ACF 


Lindgren 


a 2 


0.188 


0.188 (0.007) 


0.188 (0.005) 


0.160 (0.005) 


0.188 (0.007) 


A 2 


0.042 


0.040 (0.001) 


0.040 (0.001) 


0.014 (0.0004) 


0.032 (0.001) 


A 4 


0.009 


0.023 (0.001) 


0.023 (0.001) 


0.002 (0.0001) 


0.016 (0.001) 



Noise plus sparse signal 





Formula (15) 


MAD 


Var 


ACF 


Lindgren 


a' 2 


0.188 


0.193 (0.007) 


0.201 (0.005) 


0.171 (0.005) 


0.193 (0.007) 


A 2 


0.042 


0.040 (0.001) 


0.041 (0.001) 


0.015 (0.005) 


0.032 (0.001) 


A 4 


0.009 


0.024 (0.001) 


0.024 (0.001) 


0.002 (0.0001) 


0.016 (0.001) 



Table 2: Simulation results: exact and estimated spectral moments by four different meth- 
ods. Standard deviations are in parentheses. 



The most critical assumptions for the theoretical results presented are that the noise 
process is stationary ergodic Gaussian and that the signal peaks have equal shape and 
are unimodal with compact support. The Gaussianity assumption was chosen because it 
enabled using a closed formula for computing the p-values associated with the heights of 
local maxima. For non-Gaussian noise, p-values could be computed via simulation and we 
expect that the error control properties should be preserved, although this does not follow 
directly from the proofs presented here. 

The assumption of compact support for the signal peaks is necessary for the concept 
of true and false detection to be well defined. The unimodality assumption makes local 
maxima good representatives of true peaks. This is formally true asymptotically for high 
SNR, as the probability that a true peak is represented by one and only observed local 
maximum tends to one. Besides its practical interpretation, this property is also helpful 
technically in the proof of FDR control. We do not find the assumption of asymptotically 
high SNR restrictive in the sense that the search space is allowed to grow exponentially 
faster. Another way of seeing this is that the SNR need only grow logarithmically with 
respect to the search space. It was shown in the simulations that moderate SNR suffices 
for good performance. 

The assumption that the peaks have equal shape is technically convenient as it allows 
reducing the asymptotic analysis of many peaks to the analysis of any single one of them. 
It also simplifies the concept of an optimal bandwidth, as it is the same for all peaks. This 
assumption is also a realistic one in many practical situations, such as the neural spikes 
example presented. 

Notice that there is no assumption of sparsity of peaks in the theoretical results. The 
simulations showed that the error rates and power do not suffer much even if the peaks have 
some overlap. Sparsity is not needed as long as the spectral moments of the noise process 
are known. However, sparsity is needed so that these moments can be estimated from data, 
as suggested in the data analysis section. 

A technical issue to consider in practice is that the theory was developed for continuous 
processes, while in simulations and real data the observations are obtained in a regular 
discrete grid. The theoretical results carry through approximately if the grid is fine enough, 
but break down if the smoothing parameter 7 is less than the grid spacing. As it is well 
known in signal processing, sampling before smoothing, inevitable in practice in Step 1 
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of the procedure, introduces aliasing. Convolution of a sampled discrete sequence with a 
sampled discrete kernel is not equivalent to sampling the convolution of a continuous process 
with a continuous kernel, and the approximation gets worse as the smoothing parameter 
7 gets close to 1. For this reason, we did not include values of 7 between to 1 in our 
simulation study, and we generally do not recommend to smooth using a value of 7 that is 
too close to the grid spacing, since for that case the theory is not valid. 

From a broad perspective, we see the methods presented in this paper not only as a 
solution to the peak detection problem but as an extension of multiple testing paradigms to 
temporal and spatial domains. While FWER methods for random fields and have been well 
established, particularly in neuroimaging (Worsley et al., 2004), similar extensions of FDR 
methods have proven to be more difficult Pacifico et al. (2004a,b). Other related approaches 
include testing for a spatial signal in the wavelet transform domain (Shen et al., 2002) and 
FDR for pre-defined spatial clusters (Benjamini and Heller, 2007). Standard FDR methods 
can be applied in a discretized spatial domain (Genovese et al., 2002) but are inherently 
designed for discrete units and ignore the spatial structure of the data. Instead, it has been 
argued by Chumbley and Friston (2009) that in the case of smooth spatial signals, inference 
should be about topological features, such as cluster volume or peak height. These authors 
and others (Zhang et al., 2009) have focused on cluster volume. Our worked has focused 
on peak height. 

Potential extensions of this work include exploration of the role of sparsity in the esti- 
mation of the noise parameters, as well as adapting the procedure to situations where the 
observed process is not Gaussian, where the peaks do not have a constant width, or where 
the domain is two- or three-dimensional, as in medical image analysis. 

6 Proofs 

6.1 Gaussian autocorrelation model 

Lemma 6.1. Let w u {t) = (l/v)4>(t/v), where <f>{t) is the standard normal density. 

1. For 7, v > 0, 

Wj(t) * w u {t) = w^(t), with £ = \/ j 2 + v 2 . 

2. Let w^(t) and w^(t) denote the first and second derivatives of w^(t) with respect to t. 
Then 




Proof. 

1. Let X ~ A^(0, 7 2 ),y ~ N(0,v 2 ) with respective densities w^(t) and w v (t). Then 
X + Y ~ iV(0,7 2 + v 2 ) with density w y (t) * w u {t) = w^(t), £ = yV + v 2 . 

2. Let w^ k \t) denote the fc-th derivative of w^(t) and let H^{t) denote the k-th Hermite 
polynomial. Then 

/: w - /: p£* (1) h m 2 - ^ /: « (i) * © *■ 
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But 



V2t 



2vr 



Thus replacing in the integral and making the change of variable x = y/2t/£, we obtain 

The results of the lemma are obtained by setting in particular k = with Hq(x) = 1, 
k = 1 with H\{x) = x, and = 2 with H2(x) = x 2 — 1. 



□ 



Derivations for Example 2.5 (Gaussian autocorrelation model) 

Under the complete null hypothesis, we can write 

/OO POO 
w u (t - s)dB(s) = a W(.(t - s)dB(s) (32) 
-oo J —oo 

with £ = 1/7 2 + i/ 2 , where we have used Lemma 6.1 part (1). By Lemma 6.1 part (2), 

/oo a 2 

roc ^ 

A 2 = E[x 7 (t)] = a 2 J w 2 (t -s)ds = 



1 —00 
coo 



f 00 3cr 2 
A 4 = E[x 2 (t)} = a 2 J w 2 (t -s)ds= g-^. 

6.2 Proof of Theorem 2.10 (Weak control of FWER) 

Lemma 6.2. Let u^ on and ubou be the thresholds defined in (19) and (20), respectively. 
Then \ubou — u BoD \ — > in probability as L — > 00. 

Proof. Recall that rfiQ is the number of local maxima belonging to the set So m the segment 
[— L/2, L/2]. Denote by mo[0, 1] the number of local maxima belonging to the set So m the 
unit. Using the notation of Lemma 2.4, we have that by ergodicity, 



^-£[m [0,l]] 







in probability as L — > 00, where £?[mo[0, 1]] does not depend on L. Since log(-) is continu- 
ous, the continuous mapping theorem gives that 



log 



m 
L 



log ^[mo [0,1]] 







, m o , E[rh ] 
log log 

a a 



0, 



where we have used the additive property of the logarithm. 

Define now the monotone increasing function ^ 7 : M + — > R by ip 1 {x) = F~ 1 {1 — e~ x ), 
x > 0, where k > is a constant. The function ipj(x) is Lipschitz continuous for all x > 1 
because its derivative dtp 1 (x)/dx = e~ x / 'F 7 [^ 7 (x)] is bounded for all x > 1. Hence, 



^7 lo. 



mo 
a 



- 1p<y l0. 



E[m ] 



a 



0, 
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7 1 E[rho] 



implying that 

as L — > oo. Multiplying by <r 7 gives the result. 
Proof of Theorem 2.10. 

1. By Proposition 2.3 and Lemma 2.4, 

FWER(t&J = P{V(u* BoQ ) > 1} < E[V(u Bon )] = LE[m(u Bon , [0, 1])] 

,EW^;[o,i])] 



□ 



LE[m(-oo; [0, 1])] 



E[m(-oo;[0,l])] 



E[ra ]P 7 Bon ) 



by the Lemma 2.4, part (2). Setting P 7 (« Bon ) = a/E[m ] gives that FWER(«^ on ) < 
a. 

2. Write 



FWER(uboii) = P < T / and maxx 7 (t) > u BoD + (MBon - u Boi 
{ tef 

and apply the fact that for any two random variables X, Y and any two constants c, 
e: 

P(X > c + e) - P(|Y| > e) < P(X > Y + c) < P(X > c - e) + P(\Y\ > e). 
Taking X = max^ z 7 (i), Y = tt B on - u Bon and c = u Bon> 
FWER(iiBon) < P |t / and 

The second summand goes to in probability as L — > oo by Lemma 6.2. For the first 
summand, we follow a similar argument to the proof of part (1): 

P If jk and maxx 7 (t) > u* BoQ - s) < E[m ]F> Bon - e) = a ^ff™ - g) 



but the last fraction goes to 1 as L — > oo. 



□ 



6.3 Proof of Theorem 2.11 (Strong control of FWER) 

Lemma 6.3. Assume the model of Section 2.1 and the procedure of Section 2.2. Then as 
L — > oo and aj — > oo, Vj such that vj, La,j(f)(Kaj) — > 0, for any constant K, 

1. The expected number of local maxima in the transition region §i )7 \§i = §o\§>o, 7 tends 
to 0: 



E 



#{tern(S li7 \s 1 )} =e #{t e fn (S \§ , 7 )} 



o. 



2. The probability of obtaining a local maximum in the transition region tends to 0: 

p(#{t ern(S \So, 7 )} > 1) ->o 
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Proof. 



1. The expected number of local maxima in any set T C [-L/2, L/2] can be computed 
by the Kac-Rice formula: 

E G T : ± 7 (i) = 0, x 7 (t) < 0}] 

= f P(x-y(t) = 0) f \y\p(x 1 {t) = y) dydt 

JT J-oo 

= / P(* 7 (*) = ~A 7 (*)) / yp(z-y(t) = -y- ily(t)) dydt, (33) 

JT JO 

where p(-) denotes probability density. Recall that z 7 (i) ~ iV(0, A4 ;7 ). The inner 
integral in (33) is 



/•OO 

/ yp(zj(t) 

Jo 

L 



-y - A 7 (*)) % = / i/ 



l 



-/t 7 (t) + \/A4 



\Avr z ~ A 7 (*) 

/ A 7 (*) \ 



\A^7 



-v - A 7 (*) 



4,7 



4,7, 



A 7 (t)$ 



A 7 (*) 



4,7. 



(34) 



The set T can be decomposed as the union T = T + UT , such that /i 7 (t) > 0, Vt G T + 
and /i 7 (i) < 0, Vt G T _ . Using the inequality 



^ 1/ \ T / \ 0(X) 



1 + x^ 

for i G T + , the expression in (34) is bounded by 



x > 0, 



(35) 



< /i 7 (i) 



1 



< 



A 7 (*) 1 v-V- 



l + /i 7 (t)/A 4 , 



,(36) 



For t G T the sum of the last two terms of the expression in (34) is bounded by 



< 



At(0 



4,7 



+ A 7 (*) $ 



-flj(t) \ 



a/ ^4,7 y 
-1 + $ 



4,7 



< 



-/i 7 (£) \ v^Vr 



/ 1 + A7W/A4,- 



(37) 



Taking T = Si )7 \ Si = Uj^Sj^ \ Sj and decomposing each transition region as 
Sj,i \ Sj = (Sj,-y \ U (Sj a \ Sj)~ and replacing in (33) we get 
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< E[#{t G Si i7 \Si :i 7 (i) =0,z 7 (t) < 0}] 



= Jl 



L 



+ 



+ 



viz-fit) = -A 7 (*)) 



-/i 7 (t) + y/\4,y(f> 

A 7 (*) 



/i 7 (t) 



+ /i 7 (t)$ I 



/i 7 (t) 



4,7 



(5j i7 \5j) 

The last integral in (38) equals 



p(i 7 (t) = -/i 7 (t)) [-/i 7 (t)] eft 



< 



(Sj tl \Sj) 



I, 



P i z iit) = -A 7 (*)) [-A-yW] d * 

1 expf-^^) (-jl 7 (t))dt. 



(s jrf \s,)- \72vrA 2 , 7 



2A 2 - 



(39) 



Since /ii 7 (i) is piecewise continuous, we can express the set (Sj i7 \ <Sj)~ as the finite 
union of closed intervals [cj,dj], some to the left of Tj )7 and some to the right. Note 
that /i 7 (t) does not change sign for all t G [ci,di] and it is a decreasing function, since 
/2 7 (i) < 0. Therefore, /i 7 (cj) > /i 7 (<ij). The integral in (39) can be expressed as the 
finite sum of integrals of the form 



= $ 



fij(di) \ 



exp 



A 7 (*) 2 

2A 2 , 7 
A 7 ( c ») ^ 



-/i 7 (i)] dt = 4>(z) dz 

J (di ) 



<<t> 



V 7 ^ 



(40) 



where Qi = di if the interval is to the left of Tj i7 and g, L = Ci if the interval is to the 
right of Tj rr Using the above result, the integral in (39) is bounded by: 



L 



(Sj tl \Sj) 



P 7 (*) = ~A 7 (*)) [-P"y(t)] dt ^ 



Ah, 



A 7 (ff* 



(41) 



where 5* = argmax^ (/i 7 (<?j)/\/^2,7) an d A/i 7 = ^ 7 ( c i) ~~ h^(di) is the maximal 
difference. Note that the bounds in (36) and (37) are the same. Plugging the bounds 
(36), (37) and (41) in equation (38) we get 



< E G §i i7 \ Si : Xj(t) = 0, x y (t) < 0}] 

( A 7 W 



< Jl< 



+ 



(A 7 (s*)/V^) 



■\ A/i 7 



(42) 



For t G Sj t -y\Sj, Vj, ^ 7 (i) = djh-y^t — Tj^). h-y(t) is bounded away from 0, in particular 
|/i 7 (t)| is bounded in Sj^\Sj. Let i* be the point in Sj^\Sj where |/i 7 (i)| is minimal. 



(3 
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Continuing (40), the expected number of local maxima in Si i7 \ Si is bounded above 
by 



Jt 



< Jl 



< La 



1 



y/X 



2,7 



^2,7 



A 



2,7 



(ljLy(t*) 

ajh-f(t*) 



A 7 (*) 



2,7 



I ^',7 \ I + 



A 4 / l + /i2(t)/A 4 



lit + 



ajh^{g*) \ Ah^ 



Ah,, 



'A 2 , 7 / a,jhy(t*) 



ajh^(g* 



AK 



V ^2,7 y i/A 2 , 7 
This bound goes to by the lemma's conditions. 
2. Immediate from the previous part. 



□ 



Proof of Theorem 2.11. 

1. By Proposition 2.3 and Lemma 2.4, 

FWER(u Bon ) = P (V> Bon ) > 1) = 1 - P (V(u Bon ) = 0) 
= 1 - P ([#{/ G Tn S ,7 : maxx 7 (t) > u Bon } + #{t G Tn (S \ §0,7) : maxx 7 (i) > u Bon }] = 0) 

= 1 - P (F 7 K on ) = and #{t G f n (So \ §0, 7 ) : maxx 7 (t) > u Bon } = 0) 

< P(F> Bon )>i) + p(#{ieT n(So\So l7 )}>l). (43) 

The last inequality holds since 1 - P(A n B) < P(A C ) + P(B C ) and 

P (#{t G f n (So \ S 07 ) : maxx 7 (t) > u Bon } > l) < P (#{t G T n (S \ S , 7 )} > l) . 

The second probability in (43) goes to by Lemma 6.3, part (2). 

Although the process x 7 (t) is not stationary under the true model, it has the same 
properties as stationary process on the set So, 7 , since x 7 (t) = z 7 (i) for all t G §0,7- 
Thus, following the same arguments as in the first part of Theorem 2.3 we get that 
the first probability in (43) is bounded by a. The second probability goes to by 
Lemma 6.3. 

2. Let mo i7 be the number of local maxima belonging to the set So )7 and let v 7 = 
F~ l (01/7710,7). ft * s c l ear that £ 7 < tiBon since rrio,7 < fh, therefore 



FWER(«Bon) = P {V{uVon) > 1) < P (V(v y ) > 1) 

< p (v^) > i) + p (#{t g f n (So \ s 0i7 )} > l 



(44) 



by an argument similar to that in (43) . Following the same arguments as in the proof 
of Theorem 2.10 part (2), the first probability in (44) is bounded by a as L — > oo. 
The second probability goes to by Lemma 6.3. 



□ 
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6.4 Proof of Theorem 2.12 (Control of FDR) 

Lemma 6.4. Recall from Section 2.1 and 2.2 that fJ,~/(t) = ajhj(t — Tj ;1 ) with finite support 
Sj a and h y (t) has a unique local maximum at t — 7j, 7 that is an interior point of Sj C <Sj j7 
and has no other critical points. Let < e < 1 and Ij t£ = [rj )7 — e, Tj 7 + e] C Sj C <Sj, 7 . 
Then, as aj — > 00 and e — > such that ajh 7 (—e) — > 00 and ajh 1 {e) — > —00, 

1. p(#{t€f n/ j>e }>i) ->i 

2. P (#{t 6 Tfl e : x y (t) > u] > l) ->■ 1 /or every fixed threshold u. 
Proof. 

1. The probability that x 7 (t) has some local maxima in I £ is 



1 > p {#{t £ T n J j>e } > lj > P {x 7 (r ii7 - e) > and x 7 (r ii7 + e) < 0} 
= P {i 7 (rj i7 — e) > — /i 7 (rj i7 — e) and z~ f (jj ri + e) < — /i 7 ( T j,7 + e)} 
> 1 - [P (i 7 (r j)7 - e) < -fij(rj,j - e)) + P (i 7 (r j)7 + e) > -/i 7 (r ji7 + e))] 



> 1 



P I inf z 7 (i) < — /i 7 (Tj i7 — e) I + P sup i 7 (£) > ~f JL i{ T j,i + e 

\[Tj, 7 -e,T,-, 7 ] ' " / \[ +e ] 



P sup — i 7 (i) > a,jhy(—e) + P sup £ 7 (i) > —ajh 1 {e) 



(45) 



The probability that the supremum of any differentiable random process, f(t), is 
above u is bounded by (Adler and Taylor, 2007). 

P( sup f(t)>u) <P(f(0)>u)+E[N u }^G(u,\T\,a 2 ), 
\te[o,T] J 

where N u is the number of up-crossings by / of the level u in the interval [0, T] and a 2 
is the variance of the process. For the stationary Gaussian process £ 7 (i) with mean 
and variance A2, 7 , applying Kac-Rice formula gives: 

E[N U ] = \T\ p(z 7 = u ) J Q x P(h = x ) dx = ex P (~ ) 



A4 . 



\2, 7 



In particular, 



sup -z 7 (t) >u) < 1 - $ 



u 



l T j>7 £ > T j,l\ 



< 



-\/^2, 7 / U 



27r ^ v 2A2, 7 y y A2 i7 



A4 . 



G(u,£,A 2)7 ). 



+ £4 



\4, 7 



2vrA 



2,7 I 



(46) 
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The bracketed expression in (45) can be bounded by 

< P sup —Zy(t) > aj/i 7 (— e) + P sup z y (t) > — a-,/i 7 (e) 

\[Tj, 7 -e,Tj l7 ] / \J r -?',7> r .j,7+ £ ] / 

< G (ajLy(—s),£, A 2 , 7 ) + G (—ajLy(s),e, A 2 , 7 ) . (47) 

Both terms in (47) go to 0. Thus back to (45), P(#{i 6 Tn 7, )£ } > 1) -> 1. 

2. The probability that at least one local maxima in Ij e exceeds the fixed threshold u 
satisfies 

P ^{t g f n i" j;£ : x 7 (t) > u} > > P ^inf x 7 (t) > = P ^inf [/x 7 (t) + z^t)} > 
> P I inf fij(t) + inf z 7 (i) >u = P I inf z 7 (i) > u — a,j inf /i 7 (i)) I 

= P sup — z 7 (i) < aj inf /i 7 (i) — u = 1 — P sup — z 1 {t) > a,j inf /i 7 (i) — u . 

V 'i-c ' J V ' ' / 

The infimum of hj(t) in the range Jj j£ occurs at one of the edges. Without loss 
of generality we assume that infj\ e /i 7 (i) = /i 7 (e). Using the upper bound given in 
equation (46), applied to the interval Ij j£ and z 7 (i) rather than i 7 (t), yields 

P (#{t £ Tn l, h£ : x 7 (t) > u} > l) > 1 - C e («), (48) 

where 



ajh 7 (e) — u \ a 7 / y 

For any constant u the convergence rate of a-,7i 7 (e) — u is the same as that of ajh 1 {e). 
Therefore, C £ {u) — > as a,j — > oo for any fixed threshold «. 

□ 

Lemma 6.5. Assume the model of Section 2.1 and the procedure of Section 2.2. Let mi i7 = 
#{Tn §i, 7 } be the number of local maxima in the set Si i7 and recall that W^(u) = #{t G 
T n §i j7 : x 7 (i) > u} is the number of local maxima in §i j7 above threshold u. Under the 
assumptions of Theorem 2.12, 

1. The probability to get exactly Jl local maxima in the set Si, 7 , P (w-i, 7 = Jl) = 
P (#{fn Si )7 } = Jl) briefs to L 

2. TTie probability to get exactly Jl local maxima in the set §i j7 that exceed the threshold 
u, 

P (W 7 (u) = J L ) = P (#{* 6 Tn Si )7 : x 7 (t) > u} = J L ) -»• 1 
tends to i, /or any /ixed threshold u. 

3. rhi^/L — > A\ in probability. 
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4- W 7 (u)/mi i7 — >■ 1 in probability. 
Proof. 

1. The probability to get exactly Jl local maxima in the set §i )7 is 

1 > P (#{t G f n Si i7 } = Jl) = P (#{t G T n (u/4i% 7 )} = ^) 

(n/^{#{iGf n5 ji7 } = i}) 



> p 



> P (n£i#{t g f n/ jie } = i), 
>i-^[i-p(#{teTn/ 3 , £ } = i) 



(49) 



Combining equations (45) and (47), the above expression is bounded below by 

Jl 

1 - ^ (g (ajhy(-£),£,X 2 ,j^ + G (-ajh 7 (£),£, A 2 , 7 )) 



> 1 - J L (g (a*Lf(-£),e, A 2 , 7 ) + G (-a*/i 7 (e), e, A 2)7 )) 





\A 2 ,7 


a*hj(— e) 


/ ^4,7 




V 27rA 2,7 





a*hj(— e) 



+ 



a*/i 7 (e) 



a*/i 7 (e) 

7^" 



2,7 



+ 



i*/i 7 (e) I 



a*/i 7 (— e) 



2,7 



(50) 



where a* = argmax{G(aj/i 7 (— e), e, A 2j7 ) + G(— aj/i 7 (e), e, A 2j7 )}. 

The fastest rate of convergence toward zero, for which the requirement is still valid, 
is achieved when /i 7 (±e) converges to zero as a^^^ s \ < 5 < I. The lemma's 
conditions guarantee that expressions in (50) go to in the least favourable case. 

2. Following the same computations as in equation (49) and using the bound (48), the 
probability to have Jl truly rejected hypotheses is bounded by 



1 > P ( #{t 6 Tfl §i j7 : x-y(t) > u} = Jl 



> 1 



/<7- 



Jl 



ajh^(e) — u 



djh~(£) — u 



cr- 



•^2, 7 , ( djh^ie) — u 



2ira* 



0\ 



Following the same arguments as those in the last paragraph in the first part of this 
lemma, the expression in parentheses goes to for any fixed threshold u. 



3. Since 



mi. 



mi 7 J L 



L J L L' 

we need to show that rh\j 1 Jl — > 1 in probability. For any fixed e > 

mi i7 



< P 



Jl 



1 



> £ = P (|mi i7 - J L \ > J L e) < P (mi, 7 ^ J L ) = 1 - P (m lj7 = J L ) 



since fh\ n and Jl are integers. The probability to get exactly Jl local maxima goes 
to 1 by the part (1) of this lemma. 
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4. By part (2) of this lemma P(W / 7 (ii) = Jl) — > 1 in probability, therefore, using the 
same arguments as in part (3) of this lemma, we get W 1 {u)/Jl — > 1. Now, 



mi j7 Jl rhi n 
Both fractions go to 1 by the previous parts of this lemma. 

Lemma 6.6. For any fixed threshold u, and any positive integer J 
where W(u) = #{t £ f f] Si : x 7 (t) > u}. 

Proof. For simplicity, in this proof we omit the argument u, since it is fixed. 

/ v =0w=0 V 7 

OO J—l s x CO OO ✓ \ 

= EE ^ p (^=^-)+EE ^ p (^ = ^ = -) 

i)=0m=0 v 7 u=0uj=J v 7 

J—l OO OO OO / \ 

<EE p ( y = ^ = ti; ) + EE (^7) p ( y = v ' w = w ) 

w=0v=0 v=0w=J 
J—l 00 / v 

= £p(w = w ) + £(-^Jp(y = t,,w> j) 



□ 



w=0 u=0 

00 , <. 

<P(^< J-l) + ^f^ 7 jP(y = t;) 

ri=0 



y \ e(v) 

P (W < J - 1) + E < P (W < J - 1) + 



V + JJ- v ~ 7 E(V) + J 

The last inequality holds by Jensen's inequality, since V/ (V + J) is a concave function of 
V for V > and J > 1. □ 

Proof of Theorem 2.12. Let -R(ubh) = ?ti — + 1 be the number of rejected hypotheses 
using the threshold ubh- Let 

^ = #{s 7 (*) > M,a 7 (t) = 0,x 7 (t) < 0} 

be the empirical marginal right cumulative distribution function of Xy(t)\t £ T. Then 

~ _ #{x 7 (t) > u B K,x~,(t) = 0,x\(t) < 0} _ R(ubh) 

lUBHj ~ #{x 7 (t) = 0,x 7 (t) <0} ~ rh ' 

Multiplying both sides of this equation by a we get 

o>G(ubh) = a R ^ BH ^ = Fj(u B h). 
m 
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This result implies that the BH threshold ubk is the largest u that solves the equation 

aG{u) = F 7 (n). (51) 

The strategy is to solve equation (51) in the limit when L,a,j — > oo. We first find the 
limit of G(u). Letting W 7 (u) = #{t £ f n §i, 7 : x 7 (t) > u} and V y (u) = #{t £ f n §0,7 : 
x 7 (i) > u} , 

^, . _ #{x 7 (t) > tt,x 7 (t) = 0,a 7 (t) < 0} _ V y (u) + W 7 (m) 
#{x 7 (i) = 0,x 7 (t) < 0} ~ mo + mi 
_ V^Qu) m 0i7 ^ W 7 (k) mi, 7 

"T-0,7 "T-0,7 + "T-1,7 ™1,7 "20,7 + ^1,7 

Recall that x 7 (i) is ergodic, thus by the weak law of large numbers and Lemma 6.5 part 
(3), 

rap, 7 _ m 0[7 /L ^ E[m 0;7 ; [0, 1]] 

mo, 7 + 77ii, 7 mo, 7 /L + mi j7 /L E[mo, 7 ; [0, 1]] + -Ai 
Following the arguments in Lemma 2.4, for L — >■ oo, 

^^™ = F>). (54) 
m , 7 E[mo )7 ] 71 7 V ^ 

Finally, by Lemma 6.5, part (4) 

IT". ( i A 

(55) 



, as L — > oo. (53) 



W 7 (u) 



mi. 



as L — >■ oo and a, — )■ oo such that L<f>(Ka,j) — > 0. 

Combining equations (53), (54) and (55) with Lemma 6.5 part (4) in (52), we obtain 



E[m , 7 ; [0, 1]] + A x E[m , 7 ; [0, 1]] + A 1 ' 

Now replacing G(u) by its limit in (51), we obtain 

/ E[m ;[Q1]] + Al \ = (u) 

V 71 ^ E[mo, 7 ; [0, 1]] + Ai E[m , 7 ; [0, 1]]+Aj l{ h 

leading to the solution 

F>) = A 1+ E[mo, 7 ;[0,l]](l-a)- (56) 

Note that F^(u) is convex for u > 0, therefore exists a unique solution to equation (56). 
Let «g H be the solution of equation (56). 

The FDR at the threshold u BR is bounded by Lemma 6.6 by 

FDR(^h) < P(W(u* BH ) <J L -1) + E[] l " hu !| 



E [V(u BH )] + J L 

= P(W( Ubu ) < J L - 1) + 



E[Wh)]+E 


#{t G f n (So \ So )7 ) :x 7 (t) > u* BH } 






#{tGTn(So\So, 7 ) > u *br} 


+ Jl 



where we have split V 7 (u BH ) into the region §0,7 and the transition region Sq \ §0,7- 
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When both L and a,j go to infinity such that Lcp(Kaj) — > 0, Lemma 6.5 gives 



< E 



< E 



#{tern(S \§o, 7 )} 



o. 



#{t ern(S \So, 7 ) :x 7 (t) >u* BH } 

By (54) , the remaining terms of the fraction can be written as 

E[^Kh)] F>* H )E[m o , 7 ;[0,l]]L F>* H )E[m 0)7 ; [0, 1]] 



(57) 



E [Vy(u* BH )] + J L F 7 («* H )E[m 0)7 ; [0, l\\L + J L F 7 (u* H )E[m 0)7 ; [0, 1]] + J L /L' 
Since u BH solves (56), for L — > oo such that Jl/L — > A\ the above expression tends to 



aE[m , 7 ; [0, 1]] 



a- 



E[m 0j7 ; [0, 1]] 



< a. 



(58) 



aE[m 0l7 ; [0, 1]] +A 1 + (1- a)E[m , 7 ]; [0, 1] E[m , 7 ; [0, 1]] + A l 

Combining equations (57), (58) and Lemma 6.5 part (3), we obtain 

limsupFDR(u BH ) < a 

as L and a,j go to infinity such that L<j)(Kcij) — > 0. 

Recall that the BH threshold ubh solves the equation (51), and Ug H solves the equation 
(56), where the empirical marginal distribution, G(u), is replaced by its limit. Since -F 7 (i) 
is continuous 

F 7 (ubh) -> F 7 Kh)> (59) 

leading to 

limsupFDR(-UBH) < ex. 

□ 



6.5 Power 

Proof of Lemma 2.14. 

1. Let us show first that F 1 (ajh 1 {Q)) / F 1 {u Bon ) — > 0. To show this, we bound -F 7 (x) 
below by 



-^4, 7 C" 7 

An upper bound is given by 
Fj(x) < 1 - $ I — I - 



1 

2A 4j7 o-2 ' 



(60) 



A4 i7 £j2 



< 



(7, 



A4, 7 0" 7 



+ 



0\ 



.X 



where the first inequality uses the fact that A4 i7 /A > l/<r 7 , and the second follows 
from (35). Assuming that x is large enough, say x > <r 7 , the upper bound takes the 
form 



F 7 (x) < C 2 <t> — , 



C 2 



'27rA| 7 

A4, 7 C J 7 



+ 1. 



Thus 



^(0^(0)) < C2 lE[m]^ ^a^ 7 (0) 



^ 7 («Bon) 



(7„, 



(61) 
(62) 
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which goes to by the conditions of the lemma and by the fact that E[m]/L = 
E[m; [0, 1]] does not depend on L. 

Define now v = F^(u Bon ) and w = F 1 (ajh 7 (0)). Inverting the bounds (60) and (61), 
we get bounds 



Crf I ^on ] < y < Q 



for u Bon and similarly for a-,/i 7 (0). Thus, 



< (^Bon) 2 < ln(C 2 /v^) - ln( V ) 



(aj/iyCO)) 2 " ln(d/V2^) - \n(w) ' 

Applying L'Hopital rule, the limit of the above fration when v and tt; go to zero is the 
same as the limit of w/v, which is by (62). 

2. In contrast to the Bonferroni threshold, the asymptotic FDR threshold depends only 
on the proportion of false null hypotheses. Because of the assumption of asymptot- 
ically fixed proportion of true peaks, Jl/L — > A\ with < Ai < 1, the limit of the 
FDR threshold (56) is fixed and finite. This leads to the result. 

□ 

Proof of Theorem 2.13. 

Let u be any fixed threshold and let e and Ij )£ be defined as in Lemma 6.4. Then, by the 
definition (26) and the bound (48) in Lemma 6.4 , 

1 > Power(u) > P [#{t G I j)£ : x y (t) > u} > 1] > 1 - C £ {u) 

Combining the results of Lemmas 2.14 and 6.4, C £ {u) — > and the power of Bonferroni 
and BH procedures when using the deterministic thresholds u^ on and iig H converges to 1 
in probability. 

It was proven earlier that the gap between the random and deterministic thresholds 
of the Bonferroni and BH procedures goes to (see part (2) of Theorems 2.10, 2.11 and 
(59)). Therefore, the power when using random thresholds UBon and ubh converges to 1 as 
well. □ 
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